perturbation space
Towards Efficient Formal Verification of Spiking Neural Network
Seong, Baekryun, Kim, Jieung, Ko, Sang-Ki
Recently, AI research has primarily focused on large language models (LLMs), and increasing accuracy often involves scaling up and consuming more power. The power consumption of AI has become a significant societal issue; in this context, spiking neural networks (SNNs) offer a promising solution. SNNs operate event-driven, like the human brain, and compress information temporally. These characteristics allow SNNs to significantly reduce power consumption compared to perceptron-based artificial neural networks (ANNs), highlighting them as a next-generation neural network technology. However, societal concerns regarding AI go beyond power consumption, with the reliability of AI models being a global issue. For instance, adversarial attacks on AI models are a well-studied problem in the context of traditional neural networks. Despite their importance, the stability and property verification of SNNs remains in the early stages of research. Most SNN verification methods are time-consuming and barely scalable, making practical applications challenging. In this paper, we introduce temporal encoding to achieve practical performance in verifying the adversarial robustness of SNNs. We conduct a theoretical analysis of this approach and demonstrate its success in verifying SNNs at previously unmanageable scales. Our contribution advances SNN verification to a practical level, facilitating the safer application of SNNs.
A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness
Zhang, Yuhao, Albarghouthi, Aws, D'Antoni, Loris
This paper reveals a key insight that a one-layer decoder-only Transformer is equivalent to a two-layer Recurrent Neural Network (RNN). Building on this insight, we propose ARC-Tran, a novel approach for verifying the robustness of decoder-only Transformers against arbitrary perturbation spaces. Compared to ARC-Tran, current robustness verification techniques are limited either to specific and length-preserving perturbations like word substitutions or to recursive models like LSTMs. ARC-Tran addresses these limitations by meticulously managing position encoding to prevent mismatches and by utilizing our key insight to achieve precise and scalable verification. Our evaluation shows that ARC-Tran (1) trains models more robust to arbitrary perturbation spaces than those produced by existing techniques and (2) shows high certification accuracy of the resulting models.
PECAN: A Deterministic Certified Defense Against Backdoor Attacks
Zhang, Yuhao, Albarghouthi, Aws, D'Antoni, Loris
Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.
Evaluating Adversarial Robustness with Expected Viable Performance
McCoppin, Ryan, Dawson, Colin, Kennedy, Sean M., Blaha, Leslie M.
Abstract-- We introduce a metric for evaluating the robustness of a classifier, with particular attention to adversarial perturbations, in terms of expected functionality with respect to possible adversarial perturbations. Defining robustness in terms of an expected value is motivated by a domain general approach to robustness quantification. In support of this, it is desirable to have an approach Adversarial ML arises when some aspect of the system is to quantifying robustness that applies generally across intentionally manipulated to cause the classifier to make errors. Adversarial robustness specifically seeks to measure a model's We adopt the perspective that robustness is the ability of an performance when these perturbations are chosen selectively ML system to maintain its functionality at an acceptable level to be maximally disruptive. For example, evasion attacks add of performance when some aspect of the system is subject to human-imperceptible perturbations to a data instance to alter perturbation, consistent with the domain-general definition laid the output of a classifier, as illustrated in Figure 1.
Expressivity of Graph Neural Networks Through the Lens of Adversarial Robustness
Campi, Francesco, Gosch, Lukas, Wollschlรคger, Tom, Scholten, Yan, Gรผnnemann, Stephan
We perform the first adversarial robustness study into Graph Neural Networks (GNNs) that are provably more powerful than traditional Message Passing Neural Networks (MPNNs). In particular, we use adversarial robustness as a tool to uncover a significant gap between their theoretically possible and empirically achieved expressive power. To do so, we focus on the ability of GNNs to count specific subgraph patterns, which is an established measure of expressivity, and extend the concept of adversarial robustness to this task. Based on this, we develop efficient adversarial attacks for subgraph counting and show that more powerful GNNs fail to generalize even to small perturbations to the graph's structure. Expanding on this, we show that such architectures also fail to count substructures on out-of-distribution graphs.
RNN-Guard: Certified Robustness Against Multi-frame Attacks for Recurrent Neural Networks
Zhang, Yunruo, Du, Tianyu, Ji, Shouling, Tang, Peng, Guo, Shanqing
It is well-known that recurrent neural networks (RNNs), although widely used, are vulnerable to adversarial attacks including one-frame attacks and multi-frame attacks. Though a few certified defenses exist to provide guaranteed robustness against one-frame attacks, we prove that defending against multi-frame attacks remains a challenging problem due to their enormous perturbation space. In this paper, we propose the first certified defense against multi-frame attacks for RNNs called RNN-Guard. To address the above challenge, we adopt the perturb-all-frame strategy to construct perturbation spaces consistent with those in multi-frame attacks. However, the perturb-all-frame strategy causes a precision issue in linear relaxations. To address this issue, we introduce a novel abstract domain called InterZono and design tighter relaxations. We prove that InterZono is more precise than Zonotope yet carries the same time complexity. Experimental evaluations across various datasets and model structures show that the certified robust accuracy calculated by RNN-Guard with InterZono is up to 2.18 times higher than that with Zonotope. In addition, we extend RNN-Guard as the first certified training method against multi-frame attacks to directly enhance RNNs' robustness. The results show that the certified robust accuracy of models trained with RNN-Guard against multi-frame attacks is 15.47 to 67.65 percentage points higher than those with other training methods.
Robustness to Programmable String Transformations via Augmented Abstract Training
Zhang, Yuhao, Albarghouthi, Aws, D'Antoni, Loris
Deep neural networks for natural language processing tasks are vulnerable to adversarial input perturbations. In this paper, we present a versatile language for programmatically specifying string transformations -- e.g., insertions, deletions, substitutions, swaps, etc. -- that are relevant to the task at hand. We then present an approach to adversarially training models that are robust to such user-defined string transformations. Our approach combines the advantages of search-based techniques for adversarial training with abstraction-based techniques. Specifically, we show how to decompose a set of user-defined string transformations into two component specifications, one that benefits from search and another from abstraction. We use our technique to train models on the AG and SST2 datasets and show that the resulting models are robust to combinations of user-defined transformations mimicking spelling mistakes and other meaning-preserving transformations.
5.8 Scoped Rules (Anchors) Interpretable Machine Learning
Ignoring hyperparameters helps reduce the boundary's complexity (see original paper for more info). Since the MAB extracts the \(B\) best out of \(B \cdot p\) candidates in each round, most MABs and their runtimes multiply the \(p 2\) factor more than any other parameter. It thus becomes apparent: the algorithm's efficiency decreases with feature abundant problems. Tabular data is structured data represented by tables, wherein columns embody features and rows instances. For instance, we use the bike rental data to demonstrate the anchors approach's potential to explain ML predictions for selected instances. For this, we turn the regression into a classification problem and train a random forest as our black-box model.